Selective Sampling Methods in One-Class Classification Problems
نویسندگان
چکیده
Selective sampling, a part of the active learning method, reduces the cost of labeling supplementary training data by asking only for the labels of the most informative, unlabeled examples. This additional information added to an initial, randomly chosen training set is expected to improve the generalization performance of a learning machine. We investigate some methods for a selection of the most informative examples in the context of one-class classification problems (occ) i.e. problems where only (or nearly only) the examples of the so-called target class are available. We applied selective sampling algorithms to a variety of domains, including real-world problems: mine detection and texture segmentation. The goal of this paper is to show why the best or most often used selective sampling methods for twoor multi-class problems are not necessarily the best ones for the one-class classification problem. By modifying the sampling methods, we present a way of selecting a small subset from the unlabeled data to be presented to an expert for labeling such that the performance of the retrained one-class classifier is significantly improved.
منابع مشابه
Uncertainty sampling methods for one-class classifiers
Selective sampling, a part of the active learning method, reduces the cost of labeling supplementary training data by asking for the labels only of the most informative, unlabeled examples. This additional information added to an initial, randomly chosen training set is expected to improve the generalization performance of a learning machine. We investigate some methods for a selection of the m...
متن کاملImproving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...
متن کاملA Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملSampling Methods in Genetic Programming for Classification with Unbalanced Data
This work investigates the use of sampling methods in Genetic Programming (GP) to improve the classification accuracy in binary classification problems in which the datasets have a class imbalance. Class imbalance occurs when there are more data instances in one class than the other. As a consequence of this imbalance, when overall classification rate is used as the fitness function, as in stan...
متن کاملDetection of Fake Accounts in Social Networks Based on One Class Classification
Detection of fake accounts on social networks is a challenging process. The previous methods in identification of fake accounts have not considered the strength of the users’ communications, hence reducing their efficiency. In this work, we are going to present a detection method based on the users’ similarities considering the network communications of the users. In the first step, similarity ...
متن کامل